An Integrated Approach to Heterogeneous Data for Information Extraction
نویسندگان
چکیده
The paper proposes an integrated framework for web personal information extraction, such as biographical information and occupation, and those kinds of information are necessary to further construct a social network (a kind of semantic web) for a person. As web data is heterogeneous in nature, most of IE systems, regardless of named entity recognition (NER) or relation detection and recognition (RDR) systems, fail to get reliably robust results. We propose a flexible framework, which can effectively complement stateof-the-art statistical IE systems with rule-based IE systems for web data, and achieves substantial improvement over other existing systems. In particular, in our current experiment, both the rule-based IE system, which is designed according to some web specific expression patterns, and the statistical IE systems, which are developed for some homogeneous corpora, are sensitive only to specific information types. Hence we argue that our system performance can be incrementally improved when new and effective IE systems are added into our framework.
منابع مشابه
An Integrated DEA and Data Mining Approach for Performance Assessment
This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...
متن کاملZ-Cognitive Map: An Integrated Cognitive Maps and Z-Numbers Approach under Cognitive Information
Usually, in real-world engineering problems, there are different types of uncertainties about the studied variables, which can be due to the specific variables under investigation or interaction between them. Fuzzy cognitive maps, which addresses the cause-effect relation between variables, is one of the most common models for better understanding of the problems, especially when the quantitati...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملFrom Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Log files generated by computational systems contain relevant and essential information. In some application areas like the design of integrated circuits, log files generated by design tools contain information which can be used in management information systems to evaluate the final products. However, the complexity of such textual data raises some challenges concerning the extraction of infor...
متن کاملData Mining , Validation , and Collaborative Knowledge Capture
For large-scale data mining, utilizing data from ubiquitous and mixed-structured data sources, the extraction and integration into a comprehensive data-warehouse is usually of prime importance. Then, appropriate methods for validation and potential refinement are essential. This chapter describes an approach for integrating data mining, information extraction, and validation with collaborative ...
متن کامل